Improving Protein Localization Prediction Using Amino Acid Group Based Physichemical Encoding
نویسندگان
چکیده
Computational prediction of protein localization is one common way to characterize the functions of newly sequenced proteins. Sequence features such as amino acid (AA) composition have been widely used for subcellular localization prediction due to their simplicity while suffering from low coverage and low prediction accuracy. We present a physichemical encoding method that maps protein sequences into feature vectors composed of the locations and lengths of amino acid groups (AAGs) with similar physichemical properties. This high-level modular representation of protein sequences overcomes the shortcoming of losing order information in the commonly used AA composition and AA pair composition encoding. When applied with SVM classifiers, we showed that AAG based features are able to achieve higher prediction accuracy (up to 20% improvement) than the widely used AA composition and AA pair composition to differentiate proteins of different localizations. When AAGs and AA composition encoding combined, the prediction accuracy can be further improved thus achieving synergistic effect.
منابع مشابه
Bioinformatics Analysis of Physichemical Properties of Protein Sorting Signals
Subcellular localization of proteins is usually guided by their sorting signals encoded by subsequences of amino acids at the N-terminal or C-terminal ends. These signals are usually composed of a set of physichemically conserved amino acid groups such as the hydrophoblic cores of secretory signal peptides. Using experimentally determined sorting signals, biologists have identified the physiche...
متن کاملPrediction of Protein Sub-Mitochondria Locations Using Protein Interaction Networks
Background: Prediction of the protein localization is among the most important issues in the bioinformatics that is used for the prediction of the proteins in the cells and organelles such as mitochondria. In this study, several machine learning algorithms are applied for the prediction of the intracellular protein locations. These algorithms use the features extracted from pro...
متن کاملComputational Prediction of the Effects of Single Nucleotide Polymorphisms of the Gene Encoding Human Endothelial Nitric Oxide Synthase
ABSTRACT Background and Objective: Genetic variations in the gene encoding endothelial nitric oxide synthase (eNOS) enzyme affect the susceptibility to cardiovascular disease. Identification of the way these changes affect eNOS structure and function in laboratory conditions is difficult and time-consuming. Thus, it seems essential to ...
متن کاملIsolation and Characterization of a New Peroxisome Deficient CHO Mutant Cell Belonging to Complementation Group 12
We searched for novel Chinese hamster ovary (CHO) cell mutants defective in peroxisome biogenesis by an improved method using peroxisome targeting sequence (PTS) of Pex3p (amino acid residues 1–40)-fused enhanced green fluorescent protein (EGFP). From mutagenized TKaEG3(1–40) cells, the wild-type CHO-K1 stably expressing rat Pex2p and of rat Pex3p(1–40)-EGFP, numerous cell colonies resistant to...
متن کاملPredicting protein-protein interactions based on rotation of proteins in 3D-space
Protein-Protein Interactions (PPIs) perform essential roles in biological functions. Although some experimental techniques have been developed to detect PPIs, they suffer from high false positive and high false negative rates. Consequently, efforts have been devoted during recent years to develop computational approaches to predict the interactions utilizing various sources of information. Ther...
متن کامل